Search CORE

31 research outputs found

H-DBAS: Alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational

Author: Gojobori Takashi
Imanishi Tadashi
Kuroda Tsuyoshi
Nakao Mitsuteru
Sugano Sumio
Suzuki Yutaka
Takeda Jun-ichi
Publication venue: Oxford University Press
Publication date: 27/11/2006
Field of study

The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing (AS) variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human-transcriptome annotation meeting. H-DBAS contains 38 664 representative alternative splicing variants (RASVs) in 11 744 loci, in total. The data is retrievable by various features of AS, which were annotated according to manual annotations, such as by patterns of ASs, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of AS, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each AS event can be analyzed in the context of full-length cDNAs, enabling the user's empirical understanding of the relation between AS event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at

Crossref

PubMed Central

CyanoBase: the cyanobacteria genome database update 2010

Author: Fujibuchi
Giardine
Haider
Hunter
Kanehisa
Kaneko
Mitsuteru Nakao
Mitsuyo Kohara
Nakamura
Nakamura
Nakamura
Ozaki
Sato
Sato
Satoshi Tabata
Shinobu Okamoto
Shusei Sato
Stein
Takakazu Kaneko
Takatomo Fujisawa
Tsunakazu Fujishiro
Uchiyama
Yasukazu Nakamura
Publication venue: Oxford University Press
Publication date
Field of study

CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly

Crossref

PubMed Central

Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

Author: Akiva
Burset
Carninci
Chie Motono
Croft
Danielle Thierry-Mieg
Ewing
Ewing
Faber
Fairbrother
Fairbrother
Gilbert
Hide
Hiroko Hata
Imanishi
Jean Thierry-Mieg
Jun-ichi Takeda
Kanako O. Koyanagi
Karin
Kei Yura
Keiichi Nagai
Kim
Kimura
Kochiwa
Ladd
Lander
Landry
Lee
Lejeune
Lev-Maor
Lihua Jin
Lopez
Magrangeas
Masafumi Shionyu
Mitiko Go
Mitsuteru Nakao
Modrek
Modrek
Modrek
Nakao
Nakao
Nobuo Nomura
Ota
Oyama
Peters
Roberto A. Barrero
Scharf
Schmucker
Smith
Stamm
Stefan Wiemann
Strausberg
Sumio Sugano
Tadashi Imanishi
Takao Isogai
Takashi Gojobori
Tetsuji Otsuki
Vladimir Kuryshev
Wiemann
Wiemann
Will
Wojtowicz
Xing
Yeo
Yutaka Suzuki
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants

Crossref

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications

Author: Aerts Jan
Aoki-Kinoshita Kiyoko F
Arakawa Kazuharu
Aranda Bruno
Bonnal Raoul JP
Chun Hong-Woo
Fernández José M
Fujisawa Takatomo
Gordon Paul MK
Goto Naohisa
Haider Syed
Harris Todd
Hatakeyama Takashi
Ho Isaac
Itoh Masumi
Kasprzyk Arek
Katayama Toshiaki
Kawano Shin
Kawashima Shuichi
Kawashima Takeshi
Kido Nobuhiro
Kim Young-Joo
Kinjo Akira R
Konishi Fumikazu
Kovarskaya Yulia
Labarga Alberto
Limviphuvadh Vachiranee
McCarthy Luke
Nakamura Yasukazu
Nakao Mitsuteru
Nam Yunsun
Nishida Kozo
Nishimura Kunihiro
Nishizawa Tatsuya
Ogishima Soichi
Oinn Tom
Okamoto Shinobu
Okuda Shujiro
Ono Keiichiro
Oshita Kazuki
Park Keun-Joon
Putnam Nicholas
Satoh Noriyuki
Senger Martin
Severin Jessica
Shigemoto Yasumasa
Sugawara Hideaki
Takagi Toshihisa
Taylor James
Trelles Oswaldo
von Kuster Greg
Vos Rutger
Wilkinson Mark D
Yamaguchi Atsuko
Yamamoto Yasunori
Yamasaki Chisato
Yamashita Riu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

DSpace at Rice University

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

Author: Aerts Jan
Aoki-Kinoshita Kiyoko F
Arakawa Kazuharu
Aranda Bruno
Asai Kiyoshi
Barboza Lord Hendrix
Bonnal Raoul JP
Bruskiewich Richard
Bryne Jan C
Chun Hong-Woo
Fernández José M
Funahashi Akira
Gordon Paul MK
Goto Naohisa
Groscurth Andreas
Gutteridge Alex
Holland Richard
Kano Yoshinobu
Katayama Toshiaki
Kawas Edward A
Kawashima Shuichi
Kerhornou Arnaud
Kibukawa Eri
Kinjo Akira R
Kuhn Michael
Lapp Hilmar
Lehvaslaiho Heikki
Nakamura Hiroyuki
Nakamura Yasukazu
Nakao Mitsuteru
Nishizawa Tatsuya
Nobata Chikashi
Noguchi Tamotsu
Oinn Thomas M
Okamoto Shinobu
Ono Keiichiro
Owen Stuart
Pafilis Evangelos
Pocock Matthew
Prins Pjotr
Ranzinger René
Reisinger Florian
Salwinski Lukasz
Schreiber Mark
Senger Martin
Shigemoto Yasumasa
Standley Daron M
Sugawara Hideaki
Takagi Toshihisa
Tashiro Toshiyuki
Trelles Oswaldo
Vos Rutger A
Wilkinson Mark D
Yamaguchi Atsuko
Yamamoto Yasunori
York William
Zmasek Christian M
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

eScholarship - University of California

The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.

Author: Aerts Jan
Afzal Hammad
Antezana Erick
Arakawa Kazuharu
Aranda Bruno
Asai Kiyoshi
Belleau Francois
Bolleman Jerven
Bonnal Raoul Jp
Chapman Brad
Chun Hong-Woo
Cock Peter Ja
Eriksson Tore
Gordon Paul Mk
Goto Naohisa
Hayashi Kazuhiro
Horn Heiko
Ishiwata Ryosuke
Kaminuma Eli
Kasprzyk Arek
Katayama Toshiaki
Kawaji Hideya
Kawamoto Shoko
Kawashima Shuichi
Kido Nobuhiro
Kim Young Joo
Kinjo Akira R
Konishi Fumikazu
Kwon Kyung-Hoon
Labarga Alberto
Lamprecht Anna-Lena
Lin Yu
Lindenbaum Pierre
McCarthy Luke
Micklem Gos
Morita Hideyuki
Murakami Katsuhiko
Nagao Koji
Nakao Mitsuteru
Nishida Kozo
Nishimura Kunihiro
Nishizawa Tatsuya
Ogishima Soichi
Okamoto Shinobu
Okubo Kosaku
Ono Keiichiro
Oouchida Kenta
Oshita Kazuki
Park Keun-Joon
Prins Pjotr
Saito Taro L
Samwald Matthias
Satagopam Venkata P
Shigemoto Yasumasa
Smith Richard
Splendiani Andrea
Sugawara Hideaki
Takagi Toshihisa
Taylor James
Vos Rutger A
Wilkinson Mark D
Withers David
Yamaguchi Atsuko
Yamamoto Yasunori
Yamasaki Chisato
Zmasek Christian M
Publication venue: J Biomed Semantics
Publication date: 01/01/2013
Field of study

BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Crossref

Springer - Publisher Connector

PubMed Central

Copenhagen University Research Information System

eScholarship - University of California

Apollo (Cambridge)

BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains

Author: Aerts Jan
Akune Yukie
Antezana Erick
Aoki-Kinoshita Kiyoko F
Arakawa Kazuharu
Aranda Bruno
Baran Joachim
Bolleman Jerven
Bonnal Raoul JP
Bono Hidemasa
Buttigieg Pier Luigi
Campbell Matthew P
Chen Yi-an
Chiba Hirokazu
Cock Peter JA
Cohen K Bretonnel
Constantin Alexandru
Duck Geraint
Dumontier Michel
Fujisawa Takatomo
Fujiwara Toyofumi
Goto Naohisa
Hoehndorf Robert
Igarashi Yoshinobu
Itaya Hidetoshi
Ito Maori
Iwasaki Wataru
Kalaš Matúš
Kano Yoshinobu
Katayama Toshiaki
Katoda Takeo
Kawamoto Shoko
Kawano Shin
Kawashima Shuichi
Kim Jin-Dong
Kim Taehong
Kocbek Simon
Kokubu Anna
Komiyama Yusuke
Kotera Masaaki
Laibe Camille
Lapp Hilmar
Lütteke Thomas
Marshall M Scott
Mori Hiroshi
Mori Takaaki
Morita Mizuki
Murakami Katsuhiko
Nakao Mitsuteru
Narimatsu Hisashi
Nishide Hiroyo
Nishimura Yosuke
Nystrom-Persson Johan
Ogishima Soichi
Okamoto Shinobu
Okamura Yasunobu
Okuda Shujiro
Ono Hiromasa
Oshita Kazuki
Packer Nicki H
Prins Pjotr
Ranzinger Rene
Rocca-Serra Philippe
Sansone Susanna
Sawaki Hiromichi
Shin Sung-Ho
Splendiani Andrea
Strozzi Francesco
Tadaka Shu
Takagi Toshihisa
Toukach Philip
Uchiyama Ikuo
Umezaki Masahito
Vos Rutger
Wang Yue
Whetzel Patricia L
Wilkinson Mark D
Wu Hongyan
Yamada Issaku
Yamaguchi Atsuko
Yamamoto Yasunori
Yamasaki Chisato
Yamashita Riu
York William S
Zmasek Christian M
Publication venue
Publication date: 01/01/2014
Field of study

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed

Maastricht University Research Portal

University of Bergen

Aberystwyth Research Portal

Springer - Publisher Connector

PubMed Central

Electronic Publication Information Center

NORA - Norwegian Open Research Archives

Macquarie University ResearchOnline

Access to Research at National University of Ireland, Galway

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

Author: Amid Clara
Apweiler Rolf
Ashurst Jennifer
Auffray Charles
Barrero Roberto A
Bellgard Matthew
Bonaldo Maria de Fatima
Bono Hidemasa
Bromberg Susan K
Brookes Anthony J
Bruford Elspeth
Carninci Piero
Chakraborty Ranajit
Chelala Claude
Chen Zhu
Couillault Christine
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Go Mitiko
Gojobori Takashi
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hayashizaki Yoshihide
Hide Winston
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Isogai Takao
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kanehisa Minoru
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makalowski Wojciech
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakai Kenta
Nakao Mitsuteru
Nigam Rajni
Nishikawa Ken
Nishikawa Tetsuo
Nomura Nobuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Oishi Michio
Okada Norihiro
Okazaki Yasushi
Okido Toshihisa
Okubo Kousaku
Oota Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Quackenbush John
R. Gopinath Gopal
Ren Shuang-Xi
Richard Roberts
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakaki Yoshiyuki
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
Servant Florence
Sherry Stephen
Shiba Rie
Shimizu Nobuyoshi
Shimoyama Mary
Simpson Andrew J
Soares Bento
Souza Sandro J. de
Steward Charles
Stodolsky Marvin
Strausberg Robert L
Sugano Sumio
Sugawara Hideaki
Suwa Makiko
Suzuki Mami
Suzuki Yoshiyuki
Suzuki Yutaka
Takagi Toshihisa
Takahashi Aiko
Takeda Jun-ichi
Tamiya Gen
Tamura Takuro
Tanaka Hiroshi
Tanaka Susumu
Tanino Motohiko
Tateno Yoshio
Taylor Todd
Terwilliger Joseph D
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael A
Tonellato Peter
Unneberg Per
Veeramachaneni Vamsi
Wagner Lukas
Watanabe Shinya
Wiemann Stefan
Wilming Laurens
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Norikazu
Yasuda Tomohiro
Yoo Hyang-Sook
Yura Kei
Publication venue: Public Library of Science
Publication date: 01/01/2004
Field of study

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Queensland University of Technology ePrints Archive

Research Repository

Hokkaido University Collection of Scholarly and Academic Papers

UPF Digital Repository

White Rose Research Online

MPG.PuRe

Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

Author: Amid Clara
Ashurst Jennifer
Barrero Roberto A.
Bellgard , Matthew
Bono Hidemasa
Bromberg Susan K.
Brookes Anthony J.
Bruford Elspeth
Carninci Piero
Chelala Claude
Couillault Christine
de Fatima Bonaldo Maria
de Souza Sandro J.
Debily Marie-Anne
Devignes Marie-Dominique
Dubchak Inna
Endo Toshinori
Estreicher Anne
Eveno Eric
Eyras Eduardo
Fujii Yasuyuki
Fukami-Kobayashi Kaoru
Fukuchi Satoshi
Gopinath Gopal R.
Gough Craig
Graudens Esther
Hahn Yoonsoo
Han Michael
Han Ze-Guang
Hanada Kousuke
Hanaoka Hideki
Harada Erimi
Hashimoto Katsuyuki
Hilton Phillip
Hinz Ursula
Hirai Momoki
Hirakawa Mika
Hishiki Teruyoshi
Homma Keiichi
Hopkinson Ian
Ikeo Kazuho
Imanishi Tadashi
Imbeaud Sandrine
Inoko Hidetoshi
Itoh Takeshi
Jia Libin
Jin Lihua
Kanapin Alexander
Kaneko Yayoi
Karavidopoulou Youla
Kasprzyk Arek
Kasukawa Takeya
Kelso Janet
Kersey Paul
Kikuno Reiko
Kim Sangsoo
Kimura Kouichi
Korn Bernhard
Koyanagi Kanako O.
Kuryshev Vladimir
Lenhard Boris
Makalowska Izabela
Makino Takashi
Mano Shuhei
Mariage-Samson Regine
Mashima Jun
Matsuda Hideo
Mewes Hans-Werner
Minoshima Shinsei
Miyazaki Satoru
Mulder Nicola
Nagai Keiichi
Nagasaki Hideki
Nagata Naoki
Nakao Mitsuteru
Nigam Rajni
Nishikawa Tetsuo
O'Donovan Claire
Ogasawara Osamu
Ohara Osamu
Ohtsubo Masafumi
Okada Norihiro
Okido Toshihisa
OOta Satoshi
Ota Motonori
Ota Toshio
Otsuki Tetsuji
Piatier-Tonneau Dominique
Poustka Annemarie
Ren Shuang-Xi
Saitou Naruya
Sakai Hiroaki
Sakai Katsunaga
Sakamoto Shigetaka
Sakate Ryuichi
Schupp Ingo
SERVANT Florence
Sherry Stephen
Shiba Rie
Sugano Sumio
Suzuki Yoshiyuki
Suzuki Yutaka
Takeda Jun-Ichi
Tamura Takuro
Tanaka Susumu
Tanino Motohiko
Thierry-Mieg Danielle
Thierry-Mieg Jean
Thomas Michael, A.
Yamaguchi-Kabata Yumi
Yamasaki Chisato
Yasuda Tomohiro
Yura Kei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2004
Field of study

publication en ligne. Article dans revue scientifique avec comité de lecture. nationale.National audienceThe human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

INRIA a CCSD electronic archive server